Department of Biostatistics, Johns Hopkins School of Public Health
Obtain joint distribution of acceleration and lag acceleration for a series of lags
Calculate scalar summaries of the joint distribution
I will walk through the process for one second, one person, and one lag
Intuition: walking is cyclic process. We want to leverage cyclic nature of walking.
Karas et al. (2021)
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Toy example: 4 observations per second, 2 seconds, 1 individual
\(v_j(s)\): \(s^{th}\) acceleration observation in second \(j\)
data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]
Toy example: 4 observations per second, 2 seconds, 1 individual
\(v_j(s)\): \(s^{th}\) acceleration observation in second \(j\)
data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]
acceleration matrix \[\begin{bmatrix} \color{blue}{v_1(2)} & \color{blue}{v_1(3)} & \color{blue}{v_1(4)} & \color{teal}{v_1(3)} & \color{teal}{v_1(4)} & \color{violet}{v_1(4)} \\ \color{blue}{v_2(2)} & \color{blue}{v_2(3)} & \color{blue}{v_2(4)} & \color{teal}{v_2(3)} & \color{teal}{v_2(4)} & \color{violet}{v_2(4)} \\ \end{bmatrix} \] lag acceleration matrix \[\begin{bmatrix} \color{blue}{v_1(1)} & \color{blue}{v_1(2)} & \color{blue}{v_1(3)} & \color{teal}{v_1(1)} & \color{teal}{v_1(2)} &\color{violet}{ v_1(1)} \\ \color{blue}{v_2(1)} & \color{blue}{v_2(2)} & \color{blue}{v_2(3)} & \color{teal}{v_2(1)} & \color{teal}{v_2(2)} &\color{violet}{ v_2(1)} \\ \end{bmatrix} \]
lag matrix \[\begin{bmatrix} \color{blue}{1} & \color{blue}{1} & \color{blue}{1} & \color{teal}{2} & \color{teal}{2} & \color{violet}{3}\\ \color{blue}{1} & \color{blue}{1} & \color{blue}{1} & \color{teal}{2} & \color{teal}{2} & \color{violet}{3}\\\end{bmatrix} \]
Number columns: \(4 \cdot (4-1) / 2 = 6\)
We’re familiar with predicting a (scalar) outcome using a (scalar) predictor
\[Y_i = \alpha + \beta X_i\]
Sometimes our outcome is a curve over time (i.e., function) rather than a single value.
We can model an outcome as a function of this curve:
\[Y_i = \alpha + \int X_i(t)\beta(t)dt + \epsilon_i\]
Sometimes our outcome is a curve over time (i.e., function) rather than a single value.
We can model an outcome as a function of this curve:
\[Y_i = \alpha + \int X_i(t)\beta(t)dt + \epsilon_i\]
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
\(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = \(s^{th}\) for subject \(i\) in second \(j\)
\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lag length
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
\(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)
\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lag length
Fit using penalized splines with a quadratic penalty on the functional coefficient (Wood 2016)
\(\texttt{te()}\): tensor product smooth
\(\texttt{k = c(5, 5, 5)}\) number of basis functions for each dimension of the tensor product smooth
\(\texttt{weight\_mat}\): matrix of weights of linear functionals of smooth terms. We use equal weights so the \(i,j^{\mathrm{th}}\) entry is \(\texttt{1/ncol(accel\_mat)}\)
\(\texttt{method="REML"}\): smoothing parameter selection with restricted maximum likelihood
Rank-1 (rank-5) % accuracies
Rank-1 (rank-5) % accuracies
Capturing gold standard steps in one dataset
Open-source algorithms and 3 datasets with gold-standard step counts
How many average daily steps do Americans take?
Do estimates differ by algorithm?
Are more steps associated with lower mortality risk?
Do males take more steps than females? At what points during the day?
Outcome: steps profile over the course of the day (function)
Predictors: age, sex (scalars)
Model: \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathbb{I}\{\mathrm{sex}_i=\mathrm{female}\} + \beta_2(s)(\mathrm{age}_i-50) \] \(i\): participant; \(s \in \{1, \dots, 1440\}\): each minute of the day
Interpretation
Fast univariate inference (FUI) (Cui et al. 2021)
Fit separate (univariate) GLM at each point \(s\), smooth the resulting point estimates
Bootstrap subjects to get confidence bands
Fast univariate inference (FUI) (Cui et al. 2021)
Fit separate (univariate) GLM at each point \(s\), smooth the resulting point estimates
Bootstrap subjects to get confidence bands
BUT: NHANES is not a simple random sample
Individuals sampled in geographic clusters, minority groups oversampled
Are estimates valid for population-level inference?
No method exists for survey-weighted function on scalar regression
Standard regression, well developed methods and software to take into accounts weights and correlation between clusters (e.g. \(\texttt{svyglm}\), \(\texttt{svycoxph}\)) (Lumley 2010)
FUI built on separate GLMS
Idea: incorporate survey weights into the GLMs and use survey-aware replication/bootstrap methods for inference
First simulation study to evaluate function on scalar regression in complex survey settings